Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1871175: Add support for specifying a schema string for DataFrame.create_dataframe #2828

Merged
merged 5 commits into from
Jan 16, 2025

Conversation

sfc-gh-jdu
Copy link
Collaborator

@sfc-gh-jdu sfc-gh-jdu commented Jan 7, 2025

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-1871175

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
    • I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
  3. Please describe how your code solves the related issue.

    Support schema string

Comment on lines +1071 to +1081
for i, c in enumerate(s):
if c in ["<", "("]:
bracket_depth += 1
elif c in [">", ")"]:
bracket_depth -= 1
if bracket_depth < 0:
raise ValueError(f"Mismatched bracket in '{s}'.")
Copy link
Contributor

@sfc-gh-aling sfc-gh-aling Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this bracket check logic has repeated multiple times
do you think it's possible to check the bracket match as the initial step for only one time for the whole input string, and then in the downstream logic we can only focus on extracting the names and types

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to parse bracket to split fields, and extract names and types anyway. There is indeed a duplicate of validating whether the bracket expression is valid or not, maybe we can remove it. But to make the function self-contained, maybe let's still keep it? They are also covered in the test.

Comment on lines 1705 to 1732
dt = type_string_to_type_object(
" col1 : int , col2 : map< string , decimal(5,2) > "
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dt = type_string_to_type_object(
" col1 : int , col2 : map< string , decimal(5,2) > "
)
dt = type_string_to_type_object(
" col1 : int , col2 : map< string , decimal( 5 , 2 ) > "
)

can we add spacing here too

tests/integ/test_dataframe.py Show resolved Hide resolved
Comment on lines +3066 to +3054
- When passing a **string**, it can be either an *explicit* struct
(e.g. ``"struct<a: int, b: string>"``) or an *implicit* struct
(e.g. ``"a: int, b: string"``). Internally, the string is parsed and
converted into a :class:`StructType` using Snowpark's type parsing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mean a valid struct must contain col_name: data_type pair?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@sfc-gh-jdu sfc-gh-jdu force-pushed the jdu-SNOW-1871175-create-df-str branch from 64fe8b4 to 1a5e325 Compare January 14, 2025 21:02
@sfc-gh-jdu sfc-gh-jdu force-pushed the jdu-SNOW-1871175-create-df-str branch from 1a5e325 to 677e733 Compare January 14, 2025 21:05
Copy link

@sfc-gh-wshangguan sfc-gh-wshangguan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change able to parse "not null", e.g. "struct struct<i: integer not null>"

@sfc-gh-jdu sfc-gh-jdu merged commit 46dc756 into main Jan 16, 2025
47 of 53 checks passed
@sfc-gh-jdu sfc-gh-jdu deleted the jdu-SNOW-1871175-create-df-str branch January 16, 2025 19:35
@github-actions github-actions bot locked and limited conversation to collaborators Jan 16, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants